Homework 1

Julia Kaznowska
Warsaw University of Technology
Faculty of Mathematics and Information Science
WB XAI tabular

Importing libraries

Loading model and test data

I am using trained model from kaggle (https://www.kaggle.com/code/ravichaubey1506/end-to-end-machine-learning/notebook). I needed help with this part of homework, so I must give credit to Dawid Płudowski for showing me how to import the model to my raport.

Model prediction

Creating an explainer

Choosing observations and predicting their values

Out of the curiosity, I wanted to check whether the predicted values were anywhere near real values.

The first predicted value is about 4% smaller than the real one. The second one is about 3% smaller. It is not a bad prediction, at least for those records.

Breakdown plots

For the first chosen observation:

We can see that population has the biggest positive effect on break-down, whereas total_rooms has the biggest negative effect. Considering values, they are nearly cancelling each other.

The second observation:

We can see that for the second record the breakdown plot is completely different. population has the biggest effect from all the values, and it is negative, in contrast to our first record. The biggest positive effect - households - is significantly lower than the negative one.

Shapley values

First observation:

We can see that for the first observation breakdown plot and shapley plot looks similar. All the variables have the same "sign" of impact (positive / negative) for both of them. The top values are also the same.

Second observation:

Different situation appears in the case of second observation. lattitude variable has a positive impact on breakdown, whereas it has negative impact on shapley. It means that it is probably correlated to the other variable in the model.
We can also see that the top-impacting variables are different. Although the population is the highest negative in both breakdown and shapley, the positive one is different - it is longitude in shapley.

Unfortunately, I didn't find how to make shapley plots with boxplots in them, which would make it easier to find potentialy correlated values. We can conclude though, based on second observation and lattitude, that the geographic coordinates might be connected to each other. Correlated values are worth of further investigation.